…machine learning…
…Big Data architectures…
…formal statistical inference…
Image from http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Statistics courses often focus on the Model-step, this course aims to focus on the other.
Image from http://r4ds.had.co.nz/introduction.html
Reproducibility is the ability to get the same research results or inferences, based on the raw data and computer programs provided by researchers.
Cf being able to reproduce the result in an independent experiment (replicability).
Everything written in code (no copy-paste of values/tables/figures)
Portable (the code should execute, not only on your computer today)
Accessible (available openly)
Fully automated from raw-data to report
summary(mtcars$mpg) summary(mtcars$"mpg") summary(mtcars[, "mpg"]) summary(mtcars["mpg"]) summary(mtcars[["mpg"]]) summary(mtcars[1]) summary(mtcars[, 1]) summary(mtcars[[1]]) with(mtcars, summary(mpg)) attach(mtcars); summary(mpg) summary(subset(mtcars, select=mpg))
A suite of R-packages heavily influenced by Hadley Wickham at RStudio. Focus in this course.
We need to be able to automatically kombine text, results, tables and figures.
Image from https://rosannavanhespenresearch.files.wordpress.com/
A markup language for typesetting web-pages.
Adds executable code to Markdown.
A software for version control and a web-based hosting service.
Bild från http://phdcomics.com/comics/archive.php?comicid=1531
Not strictly necessary for reproducibility, but important for large projects involving multiple coders. In this course it is mainly a side effect of using GitHub for publication, more about version control in Computer Science for Mathematicians (DA3018).
Also .Rproj for increased portability.
Everything written in code: R
Portable: .Rproj (RStudio)
Accessible: GitHub
Automated: R Markdown
Give basic training and preparation for class activities. Not part of examination.
19/11: Sebastian Tengborg, Data scientist at